\documentclass[12pt]{article}

\usepackage[includehead, margin = 2cm]{geometry}
\usepackage{setspace, graphicx, amssymb, amsmath, epsfig, natbib, array, multirow, hyperref, listings}
\usepackage{fancyhdr}

\pagestyle{fancyplain}


\lstset{
    language=R,
    basicstyle=\scriptsize\ttfamily,
    commentstyle=\ttfamily,
    numbers=left,
    numberstyle=\ttfamily\footnotesize,
    stepnumber=1,
    numbersep=5pt,
    backgroundcolor=\color{white},
    showspaces=false,
    showstringspaces=false,
    showtabs=false,
    frame=single,
    tabsize=2,
    captionpos=b,
    breaklines=true,
    breakatwhitespace=false,
    %title=\lstname,
    escapeinside={},
    keywordstyle={},
    morekeywords={}
    }

%Those are packages that I use to compile this document - if MikTeX asks you to install them, just agree.
\hypersetup{
    bookmarks=false,         % show bookmarks bar?
    unicode=false,          % non-Latin characters in Acrobatís bookmarks
    pdftoolbar=true,        % show Acrobatís toolbar?
    pdfmenubar=true,        % show Acrobatís menu?
    pdffitwindow=false,     % window fit to page when opened
    pdfstartview={FitH},    % fits the width of the page to the window
    pdftitle={},    % title
    pdfauthor={},     % author
    pdfsubject={Subject},   % subject of the document
    pdfcreator={Creator},   % creator of the document
    pdfproducer={Producer}, % producer of the document
    pdfkeywords={keywords}, % list of keywords
    pdfnewwindow=true,      % links in new window
    colorlinks=true,       % false: boxed links; true: colored links
    linkcolor=black,          % color of internal links
    linkbordercolor={0 0 0},  %border color
    citebordercolor={0 0 0},
    citecolor=black,        % color of links to bibliography
    filecolor=black,      % color of file links
    urlcolor=blue,         % color of external links
}


\title{Replication Project Draft}
\author{
	Andrew Ballard \\
	PhD Student in Political Science \\
	\emph{Duke University} \\
	}


\begin{document}

\maketitle

\section{Summary of Gelman, Shor, Bafumi, and Park (2007). Rich State, Poor State, Red State, Blue State: What's the Matter with Connecticut?}

Gelman et al. investigate the recent journalistic trend viewing the Democrats being the party of rich elites instead of the party of the masses, which was the dominant view throughout the 20th century. The authors look to To accomplish this task, they fit National Election Study (NES) and National Annenberg Election Survey (NAES) data from a number of years on income, presidential vote, and demographic characteristics to a number of different logistic regression models. Their main findings are that richer states tend to support Democrats, but within each state, income increases the probability to vote Republican. However, the within-state effect of income is lower in traditionally ``blue'' states.

Below I present a discussion of the methodology used by Gelman et al., a replication of their main findings, and detailed plans to extend the project. 



\section{Methodology and Replication}

The data presentation in this paper consists of 8 figures, 5 of which I have replicated below. The code used in replicating the figures, as well as all data used in the paper, can be found at my dataverse: \textbf{http://dvn.iq.harvard.edu/dvn/dv/ballardao}. I have had the data for this paper less than a week, so much of my time has been spent familiarizing myself with the data. Thus, my progress on this project at this point only includes a replication, and not an extension.\footnote{I had originally planned to replicate Jerit, J. and Barabas, J. (2012), Partisan Perceptual Bias and The Information Environment. However, I ran into a number of road blocks. At first, I tried to run the data on my laptop, which was fruitless since the datasets were so large. Next, I wrestled with getting the authors' code to run on the SSRI computer cluster, with no luck. Finally, I enlisted the help of persons much more knowledgeable than myself about Stata and data analysis, to no avail. This was the last straw, and I desperately embarked on a journey to find a suitable replacement study. I was assigned to read Gelman et al. for the February 25th meeting of the Behavior Core course, and Dr. Andrew Gelman was kind enough to send me the code and data for this project. He answered my e-mail within hours, and on a Sunday. I owe him a debt of gratitude.} Particularly, I have spent a great deal of time trying to understand the authors' code. They employ methods of doing logistic regression that are wholly strange and new to me, and I am still working on grasping every detail. Currently, all graphs presented below were created using code from the authors, nearly in its original form. I did have to make a few changes in order to get R to recognize some of the objects from Stata, but this was minor and only took a few hours of debugging. One of my plans for moving forward is to put as much of the code in my own words as possible, although I suspect there will still be a great deal that I use from the authors. 

\subsection{Figures 1 and 2}

To create Figure 1, the authors first fit the data to a logistic regression model predicting Republican vote share by state as a function of income (in tens of thousands of 1996 US dollars). Given the continuous nature of vote share, this model is logical to use instead of something like a probit or ordered probit. The results are presented as income regression coefficients and standard errors from 1952 to 2004 for all states, southern states, and non-southern states. The downward trend shows that recently, Republicans have received a higher share of the vote in poor states than rich states.  This effect is more pronounced for southern states than for non-southern states, which fits the widely observed disappearance of the Southern Democrat in the 1950s and 60s.

\begin{figure}[h!]
\centering
\caption{Replication of regression coefficients and standard errors for Republican vote share as a function of average state income}
\includegraphics[width=6in]{"Figure1"}
\end{figure}

Figure 2 is similar in methodology to Figure 1, except that instead of by state they use individual votes and income as their variables for logistic regression. Since the dependent variable (Vote Republican; Yes = 1, No = 0) is binary, a logistic regression is a fine choice of method. The pattern in figure 2 clearly shows that although rich states tend to have a lower Republican vote share, that overall, wealthier individuals are more likely to vote Republican. 

\begin{figure}[h!]
\centering
\caption{Replication of regression coefficients and standard errors for Republican vote as a function of individual income}
\includegraphics[width=6in]{"Figure2"}
\end{figure}

\subsection{Figures 3, 4, and 5}

This may be seemingly paradoxical, that rich states show more support for Democrats, but rich people tend to vote Republican. To untangle this puzzle, we turn to Figures 3 and 4, which fit a multilevel model using the lmer() function in R to Annenberg pol data from 2000 and 2004. In order to create the plots themselves, the authors wrote a 189-line function called ``Superplot'', which I do not yet fully understand. Figure 3 shows a varying intercept (but not varying slope) multilevel model. The three lines represent Mississippi (the poorest state, historically dominated by the Republicans), Ohio (a ``middle of the road'' income state, usually a swing state), and Connecticut (the richest state, perennially ``blue''). The x-axis values are a recoded version of a five-point quantile-based NES scale.\footnote{The x-axis values are as follows: -2 = 0-16 percentile, -1 = 17-33 percentile, 0 = 34-67 percentile, 1 = 68-95 percentile, and 2 = 96-100 percentile.} The y-axis shows the probability of voting Republican, or for President GW Bush. Within each state, wealthier voters tend to vote Republican, but richer states are more likely to support Democrats. 

\begin{figure}[h!]
\centering
\caption{Replication of the probability of voting Republican as a function of individual income in 2000 and 2004 -- Varying intercepts}
\includegraphics[width=4in]{"Figure3"}
\end{figure}

Figure 4 uses a similar concept as Figure 3, except now both the intercept and slope of the probability to vote Republican as a function of individual income for each state can vary. This analysis uncovers the trend that income matters more in poorer states and less in richer states, which is evaluated further in Figure 5. 

\begin{figure}[h!]
\centering
\caption{Replication of the probability of voting Republican as a function of individual income in 2000 and 2004 -- Varying slopes and intercepts}
\includegraphics[width=4in]{"Figure4"}
\end{figure}

The three states chosen in Figures 3 and 4 are good choices, but they are not representative of all 50 states. Thus, we turn to Figure 5 to look at estimated slopes for the multilevel model of all 50 states. There is a clear negative relation between a state's slope 
(as it pertains to the model presented in Figures 3 and 4) and the state's median income. This means that income matters less in determining the probability to vote Republican for rich states than for poor states. However, this effect appears in 2000 with much more clarity than in 2004. 

\begin{figure}[h!]
\centering
\caption{Replication of slope estimates for all 50 states as a function of income}
\includegraphics[width=4in]{"Figure5"}
\end{figure}

\section{Extension Ideas}

Gelman et al. provide some valuable ideas in this paper, but I believe there are a number of ways to improve their findings. 

\subsection{Is it Really Income?  If So, How?}

I am unconvinced that income is the driving force behind vote choice. Personally I tend to advocate an approach that values explaining complex decisions like voting with a number of variables, including income but also including race, gender, education, geography, and religion. The authors partially account for some of these, such as race and education, in their model. However, I believe additional analyses can be performed in order to improve the results. Specifically, I would like to see what effect religion has on these results.

Let us assume for a moment that after performing the additional analyses mentioned above, the results do not change. What then? Gelman et al. standardize income to 1996 US dollars, but they do not  normalize it. A dollar is worth much more in Arkansas than it is in Connecticut, and adjusting for cost-of-living could have striking implications for the results of this paper. Adjusting for cost-of-living is straightforward, and I plan to include it in my extension.

\subsection{Presentation, Presentation, Presentation}
Sometimes of the Figures are straightforward, but others are somewhat confusing. Particularly, I would like to do a better job presenting the findings of Figure 5. I find it difficult to make the transformation from ``Slope'' on the y-axis to the slope of a line in space in my head. I think it could be valuable to present all 50 states in a similar fashion to Figures 3 and 4, but to color code them along a continuum to show the relative wealth of different states. It would also be interesting to try and present the data in different ways. I think that separation plots would be a worthwhile investment for showing how the model predicts Republican votes.

\subsection{Validity and Simulated Scenarios}
Lastly, I plan to include in my extension a discussion of the validity of the model in predicting presidential votes in the form of a cross validation, and use that to run simulations on scenarios for a number of different types of states. The simulations will include more than one independent variable, such as those discussed in improving the model in subsection 3.1 above.

\pagebreak
\section*{Appendix - Gelman et al. Replication R Code}

\subsection*{Figures 1 and 2}
\lstinputlisting{./ReplicationFigure1Figure2.R}

\pagebreak
\subsection*{Superplot Function}
\lstinputlisting{./superplotfunction.R} 

\pagebreak
\subsection*{Figures 3, 4, and 5}
\lstinputlisting{./blueredfigs345.R}


\end{document}


